---
id: metrics-mcp-use
title: MCP-Use
sidebar_label: MCP-Use
---

<head>
  <link rel="canonical" href="https://deepeval.com/docs/metrics-mcp-use" />
</head>

import Equation from "@site/src/components/Equation";
import MetricTagsDisplayer from "@site/src/components/MetricTagsDisplayer";

<MetricTagsDisplayer singleTurn={true} referenceless={true} />

The MCP Use is a metric that is used to evaluate how effectively an **MCP based LLM agent makes use of the mcp servers it has access to**. It uses LLM-as-a-judge to evaluate the MCP primitives called as well as the arguments generated by the LLM app.

## Required Arguments

To use the `MCPUseMetric`, you'll have to provide the following arguments when creating an [`LLMTestCase`](https://www.deepeval.com/docs/evaluation-test-cases):

- `input`
- `actual_output`
- `mcp_servers`

You'll also need to supply any `mcp_tools_called`, `mcp_resources_called`, and `mcp_prompts_called` if used, for evaluation to happen. Click here to learn about [how it is calculated](#how-is-it-calculated).

## Usage

The `MCPUseMetric` can be used on a single-turn `LLMTestCase` case with MCP parameters. Click here to see [how to create an MCP single-turn test case](https://www.deepeval.com/docs/evaluation-mcp#single-turn).

```python
from deepeval import evaluate
from deepeval.metrics import MCPUseMetric
from deepeval.test_case import LLMTestCase, MCPServer

test_case = LLMTestCase(
    input="...", # Your input here
    actual_output="...", # Your LLM app's final output here
    mcp_servers=[MCPServer(...)] # Your MCP server's data
    # MCP primitives used (if any)
)

metric = MCPUseMetric()

# To run metric as a standalone
# metric.measure(convo_test_case)
# print(metric.score, metric.reason)

evaluate([test_case], [metric])
```

There are **SIX** optional parameters when creating a `MCPTaskCompletionMetric`:

- [Optional] `threshold`: a float representing the minimum passing threshold, defaulted to 0.5.
- [Optional] `model`: a string specifying which of OpenAI's GPT models to use, **OR** [any custom LLM model](/docs/metrics-introduction#using-a-custom-llm) of type `DeepEvalBaseLLM`. Defaulted to 'gpt-4o'.
- [Optional] `include_reason`: a boolean which when set to `True`, will include a reason for its evaluation score. Defaulted to `True`.
- [Optional] `strict_mode`: a boolean which when set to `True`, enforces a binary metric score: 1 for perfection, 0 otherwise. It also overrides the current threshold and sets it to 1. Defaulted to `False`.
- [Optional] `async_mode`: a boolean which when set to `True`, enables [concurrent execution within the `measure()` method.](/docs/metrics-introduction#measuring-a-metric-in-async) Defaulted to `True`.
- [Optional] `verbose_mode`: a boolean which when set to `True`, prints the intermediate steps used to calculate said metric to the console, as outlined in the [How Is It Calculated](#how-is-it-calculated) section. Defaulted to `False`.

### As a standalone

You can also run the `MCPUseMetric` on a single test case as a standalone, one-off execution.

```python
...

metric.measure(convo_test_case)
print(metric.score, metric.reason)
```

:::caution
This is great for debugging or if you wish to build your own evaluation pipeline, but you will **NOT** get the benefits (testing reports, Confident AI platform) and all the optimizations (speed, caching, computation) the `evaluate()` function or `deepeval test run` offers.
:::

## How Is It Calculated

The `MCPUseMetric` score is calculated according to the following equation:

<Equation formula="\text{MCP Use Score} = \text{AlignmentScore(Primitives Used, Primitives Available)}" />

The **AlignmentScore** is judged by an evaluation model based on which primitives were called and their generated arguments with respect to the user's input.

:::info
The `MCPUseMetric` evaluates if the right tools have been called with the right parameters i.e, if all the optional parameters above are not provided, the `MCPUseMetric` evaluates if calling any of the available primitives would have been better.
:::
